Introduction to Two-Phase Commit (2PC)
Learn about the two-phase commit and the motivation behind its creation.
Two-phase commit (2PC) is a distributed consensus algorithm that was historically developed in the context of database transactions. A transaction is an abstraction that usually provides the ACID correctness guarantees to the end programmers. The “A” in ACID refers to atomicity—a transaction either commits in total or aborts; in other words, something happens completely or it doesn’t occur at all. When we want different systems to either commit or abort together, we use the 2PC algorithm. In the context of transactions, the consensus is about the binary decision of either committing or aborting among all the participants in the 2PC protocol.
A common use case for 2PC is when a transaction reads and writes multiple database shards, and an application wants to ensure that all such changes happen as one unit. In this case, consensus means that every participant should be on the same page. The 2PC algorithm can be used in any scenario that requires atomicity across all participants.
In real-life scenarios, we know that achieving consensus can be challenging, even for simple decisions like choosing a restaurant among friends. Because the 2PC needs to bring all the participants on the same page (we can't just take the majority), failures such as node or network failures pose significant challenges to the 2PC algorithm.
This chapter will elaborate on transactions a bit more before discussing the 2PC algorithm. We'll then see how different failures, like network and node failures, are managed by the 2PC algorithm.
Transactions and atomic commit#
In a distributed system, transactions are used to manage the reading and writing of data among nodes. A transaction is a unit of work that reads and writes data from one or more nodes. Achieving atomic commit is essential to maintain consistency and reliability, meaning that all transaction changes are committed, or none of them are. However, ensuring atomic commit is challenging in a distributed system because it requires coordination among multiple nodes.
For example, consider a banking application that transfers funds between accounts. The transaction involves deducting the amount from the sender's account and adding it to the receiver's account. To ensure atomic commit, the transaction must be coordinated across multiple nodes to ensure that the debit and credit operations occur atomically. The withdrawal and deposit both happen, or they both should not happen at all. (In the context of ACID, the “C” refers to the application's notion of consistency. In our example, the invariant is that across all accounts, the credit and debit amounts should always be equal. Atomicity helps the application achieve and keep its consistent state.)
1 of 6
2 of 6
3 of 6
4 of 6
5 of 6
6 of 6
Two-phase commit protocol#
The two-phase commit protocol (2PC) is a widely used consensus algorithm for achieving atomic commit in a distributed system to make sure that all nodes commit or abort. 2PC works in two steps. In the first step, the agreed-upon value is shared and votes are gathered. In the second step, the coordinator sends a commit message to all nodes to finalize the transaction.
Real-world example#
Many real-world databases use the 2PC algorithm. Google's Spanner is one example. The following illustration shows Spanner's architecture, where different Paxos groups are in charge of specific tablet replication. When we need to conduct a transaction across Paxos groups, one replica in each group is designated as a participant leader (we can think of them as participants in the 2PC protocol). Among participant leaders, one leader initiates the transaction and acts as the coordinator.
Point to ponder
Question 3
Could Spanner use Paxos over Paxos instead of using 2PC over Paxos for transactions?
No. In a typical transaction, we might be reading/writing to multiple Paxos groups (shards). Using Paxos over Paxos (with a simple majority rule) might mean that the top-level Paxos will move on when a majority is available. This is not preferred for transactions because of the following reasons:
-
Usually, different data items need to be locked to provide appropriate isolation, and if that is not done, the transaction can’t move on.
-
Transactions are short-lived, and there can be multiple combinations of shards involved in different transactions over time. We don’t want to use so many temporary Paxos groups for any pending operations to catch up for performance reasons.
3 of 3
Initial sketch of the 2PC algorithm#
The 2PC protocol is designed with the assumption of a leader or coordinator node that is responsible for managing the state, gathering votes, and serving as the main reference point for the agreement process. The other nodes in the system are known as cohorts (participants), which typically operate on replicas against which transactions are performed. Both the coordinator and cohorts maintain local operation logs for every step that is executed. The logs help identify the protocol's last executed step, allowing nodes to resume from where they left off in case of a failure. Participants vote on a proposed value, typically an identifier for the distributed transaction, to determine whether to accept or reject it. This value can be used to retrieve the full data associated with the transaction and is also applicable in other contexts.
Point to ponder
Question
Are two-phase commit (2PC) and two-phase locking (2PL) the same?
Two-phase commit (2PC) and two-phase locking (2PL) are two different concepts and should not be confused with one another. 2PC is designed to ensure atomic commit in a distributed transaction, while 2PL is designed to provide serializable isolation between concurrent transactions accessing the shared content between transactions. To prevent misunderstandings, consider them as separate concepts; don’t be misled by their similar names!
In the context of ACID, the 2PC gives us “A” (atomicity), while 2PL provides us “I” (isolation).
The coordinator node can be selected in different ways: by receiving a request to execute the transaction, by being manually assigned, or by being fixed for the system’s entire lifespan. The coordinator role can be transferred to another participant for better reliability or performance when one round of 2PC is complete. For the next round, someone else can be a coordinator. The protocol does not impose any restrictions on the coordinator's role.
The 2PC protocol, as its name suggests, consists of two phases. Using two phases is important for achieving atomicity, which means that either all nodes commit the transaction or all nodes abort the transaction. These phases are the prepare and commit phases. Let's discuss them in detail in the next lesson.
Quiz on Consensus Fundamentals
Working of the Two-Phase Commit Protocol